Search CORE

8 research outputs found

Recommended from our members

On adopting Ontology Alignment techniques within the Phenotype Acquisition Process

Author: Hovland D.
Håndstad T.
Jimenez-Ruiz E.
Slaughter L.
Waaler A.
Publication venue
Publication date: 10/01/2018
Field of study

The work presented in this paper is framed within the context of the BigMed project, aproject funded by the Norwegian Research Council. One of the objectives of BigMed isto enhance the phenotype acquisition process in newborns with a monogenetic disorder,one of the four patient groups studied in the project. The use of the Human PhenotypeOntology (HPO) [1] to tag phenotypes and systems like PhenoTips have substantiallycontributed to the overall phenotype acquisition workflow. PhenoTips [2] is a systemfor the acquisition of phenotypic information in patients with a genetic disease. Phe-noTips also suggests, given a selected set of HPO terms, candidate diagnoses usingOMIM (Online Mendelian Inheritance in Man) codes, and related genes for a subse-quent genetic test. Although PhenoTips represents a fantastic effort, we believe it couldbe extended with suitable Semantic Web solutions. In this paper, we present the firststeps to adopt ontology alignment techniques to contribute to the diagnostic process

City Research Online

A ChIP-Seq Benchmark Shows That Sequence Conservation Mainly Improves Detection of Strong Transcription Factor Binding Sites

Author: A Moses
A Siepel
A Stark
BT Naughton
D Boffelli
D Karolchik
DT Odom
E Birney
Finn Drabløs
G Badis
G Sandve
J Bryne
J Ernst
J Hawkins
JA Hanley
K Klepper
L Elnitski
M Rye
M Tompa
Morten Beck Rye
P D'haeseleer
P Kheradpour
PJ Park
Pål Sætrom
R Jothi
Sridhar Hannenhalli
T Vavouri
Tony Håndstad
V Matys
WW Wasserman
X Xie
Y Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Transcription factors are important controllers of gene expression and mapping transcription factor binding sites (TFBS) is key to inferring transcription factor regulatory networks. Several methods for predicting TFBS exist, but there are no standard genome-wide datasets on which to assess the performance of these prediction methods. Also, it is believed that information about sequence conservation across different genomes can generally improve accuracy of motif-based predictors, but it is not clear under what circumstances use of conservation is most beneficial.Here we use published ChIP-seq data and an improved peak detection method to create comprehensive benchmark datasets for prediction methods which use known descriptors or binding motifs to detect TFBS in genomic sequences. We use this benchmark to assess the performance of five different prediction methods and find that the methods that use information about sequence conservation generally perform better than simpler motif-scanning methods. The difference is greater on high-affinity peaks and when using short and information-poor motifs. However, if the motifs are specific and information-rich, we find that simple motif-scanning methods can perform better than conservation-based methods.Our benchmark provides a comprehensive test that can be used to rank the relative performance of transcription factor binding site prediction methods. Moreover, our results show that, contrary to previous reports, sequence conservation is better suited for predicting strong than weak transcription factor binding sites

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

NORA - Norwegian Open Research Archives

A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis

Author: A Ben-Hur
A Floratos
AR Shah
B Qian
B Rost
B-J Webb-Robertson
Bin Liu
C Leslie
CG Nevill-Manning
CS Leslie
H Ogul
H Rangwala
H Saigo
I Rigoutsos
J Bellegarda
J Shawe-Taylor
K Karplus
L Holm
L Liao
Lei Lin
M Ganapathiraju
M Gribskov
Q Dong
Q Dong
Q Dong
Q Dong
Q Dong
Qiwen Dong
QJ Su
QW Dong
R Kuang
S Henikoff
SE Brenner
SE Dowd
SF Altschul
SF Altschul
T Damoulas
T Håndstad
T Jaakkola
T Lingner
TF Smith
TK Landauer
TL Bailey
VN Vapnik
WR Pearson
WS Noble
Xiaolong Wang
Xuan Wang
Y Hou
Y Hou
Y Yang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. Results In this paper, a novel building block of proteins called Top-<it>n</it>-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-<it>n</it>-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-<it>n</it>-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-<it>n</it>-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-<it>n</it>-grams and LSA gives significantly better results compared to related methods. Conclusion The method based on Top-<it>n</it>-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-<it>n</it>-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Clustered ChIP-Seq-defined transcription factor binding sites and histone modifications map distinct classes of regulatory elements

Author: A Barski
A Kanhere
A Marson
A Pekowska
A Rada-Iglesias
A Visel
AP Boyle
B Li
BE Bernstein
BE Bernstein
CM Koch
CZ Zang
D Karolchik
DS Johnson
E Birney
E Lieberman-Aiden
Finn Drabløs
G Hon
G Hon
GA Wray
GE Zentner
H Xu
H Yu
J Ernst
J Kim
JE Phillips
JM Vaquerizas
KJ Gaulton
KJ Won
KJ Won
KL MacQuarrie
L Ooi
LA Pennacchio
M Blanchette
M Bulger
M Gupta
M Guttman
MA Nobrega
MB Rye
MC Tsai
MH Kagey
Morten Rye
MP Creyghton
ND Heintzman
ND Heintzman
O Wallerman
PJ Farnham
PJ Park
PV Kharchenko
PV Kharchenko
Pål Sætrom
Q Zhou
R Jothi
S Cuddapah
S Roy
T Kouzarides
T Li
T Ravasi
TH Kim
TK Kim
Tony Håndstad
TS Mikkelsen
V Gotea
W Niu
X Chen
Y Zhang
Z Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Transcription factor binding to DNA requires both an appropriate binding element and suitably open chromatin, which together help to define regulatory elements within the genome. Current methods of identifying regulatory elements, such as promoters or enhancers, typically rely on sequence conservation, existing gene annotations or specific marks, such as histone modifications and p300 binding methods, each of which has its own biases. Results Herein we show that an approach based on clustering of transcription factor peaks from high-throughput sequencing coupled with chromatin immunoprecipitation (Chip-Seq) can be used to evaluate markers for regulatory elements. We used 67 data sets for 54 unique transcription factors distributed over two cell lines to create regulatory element clusters. By integrating the clusters from our approach with histone modifications and data for open chromatin, we identified general methylation of lysine 4 on histone H3 (H3K4me) as the most specific marker for transcription factor clusters. Clusters mapping to annotated genes showed distinct patterns in cluster composition related to gene expression and histone modifications. Clusters mapping to intergenic regions fall into two groups either directly involved in transcription, including miRNAs and long noncoding RNAs, or facilitating transcription by long-range interactions. The latter clusters were specifically enriched with H3K4me1, but less with acetylation of lysine 27 on histone 3 or p300 binding. Conclusion By integrating genomewide data of transcription factor binding and chromatin structure and using our data-driven approach, we pinpointed the chromatin marks that best explain transcription factor association with different regulatory elements. Our results also indicate that a modest selection of transcription factors may be sufficient to map most regulatory elements in the human genome.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

NORA - Norwegian Open Research Archives